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ABSTRACT 


An  approach  to  digital  nonlinear  prediction  is  proposed  and  analyzed. 
The  basic  relations  are  developed.  The  nonlinear  operator  is  obtained  by 
quantization  of  the  data. 

The  model  is  developed  in  terms  of  occupancy  of  data  cells  in  N-space  . 
Extensions  to  increase  occupancy  and  reduce  error  are  formulated.  Illus¬ 
trative  results  are  included. 

A  comparison  with  linear  techniques  is  made  and  over-all  conclusions 
on  error,  quantization  level,  length  of  data  required,  time  invariance,  etc. , 
are  provided. 
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1.0  INTRODUCTION 


Methods  to  effect  statistical  prediction  have  involved  poly¬ 
nomial  fitting  and  correlation  and/or  spectral  analysis.  Both  can  use 
a  minimum  mean  square  error  criterion  and  result  in  a  set  of  weights 
as  an  optimum  linear  operator.  In  both  cases,  calculations  of  the 
weights  based  on  knowledge  or  computation  of  the  pertinent  statistics 
is  made. 

Our  concern  here  is  with  a  non  linear  approach  which  involves 
any  intermediate  determination  of  the  statistics.  It  also  offers  a 
readily  available  means  for  judging  error  and  adjusting  for  an  improved 
prediction.  Being  a  non  linear  method,  the  results  should  be  at  least 
as  good  as  a  corresponding  linear  technique. 

To  obtain  the  desired  non  linearity,  a  quantization  of  the 
data  is  required.  Of  course  such  a  quantization  itself  degrades  the 
error  possible  with  the  technique.  The  technique  as  applied  to  digital 
simulation  forms  a  variation  on  an  approach  discussed  in  Reference  1. 

We  now  discuss  the  approach. 

2 . 0  THE  MODEL 

We  let  t  be  the  present  time  and  T  the  total  interval  of 
data  for  processing.  We  let  a  be  the  time  advance  of  prediction. 
Without  loss  of  generality,  we  take  the  data  interval  as  [0,  Tj  and 
consider  the  usable  past  at  any  t  from  t  to  t  -  aN ~l  .  For  equally 
spaced  sampled  data  aN =  (N-l)6  where  6  is  the  sampling  interval. 
These  definitions  may  be  summed  up  in  the  following  sketch 

Author’ s  Note 

The  work  reported  here  was  originally  considered  by  the  author  in 
the  summer  of  1960. while  at  S.T.L.  Other  matters  prevented  a  proper 
evaluation  and  summary  at  that  time.  The  present  report  represents 
a  current  effort  to  fill  this  need. 
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Let  the  continuous  parameter  time  series  be  X(t) .  We  then  estimate 
X(t+a)  by 

X*  (t+a),  the  predicted  value  at  t+a  using  X(0.) ,  C^t 
In  the  sampled  finite  past  case  we  have 

X*N  (t+a)  ,  the  predicted  value  at  t+a  using  X(t-a^) ,  i=o ,  1,  ,  N-l 

i.e  by  {Xo,  X.x  ,  X._2  ‘  ,  X-(n_i)3. 

Although  Al  a=a^+i -a^  may  not  be  equal,  nothing  is  Lost  by  assuming  so, 
since  the  average  value  of  the  Aa  is  constrained  by  the  sampling  theorem. 
We  take 

X*  (t+a)  =  F  [X(t),  X(t-6)  ,  ••’,  X(t-aN  -!  )  ]  where 

F  is  a  time  dependent  non  linear  function.  In  this  case  we  can  always 
make  the  instantaneous  square  error  e  =  X(t+a)  -  F^X(t) ,  X(t-ai)  ,  , 

X(t-aN_1)]^  =  o. 

However  since  F  is  to  be  used  when  X(t+a)  is  not  available,  it  is  better 
not  to  have  F  time  dependent  Hence  consider  a  range  of  times  over 

which  a  time  invariant  operator  F  must  minimize  the  mean  squared  error 
given  by 


n  T-a 


e  *  =  JL 
a 

a 


(X(t+a)  -  F[X(t),  X(t-ai),*”,  X(t-aN.!)])Sdt 


aN 


with  [aN_i  ,  T  -  a]  the  maximum  range  over  which  to  consider  error  or  in 
other  words  over  which  we  can  distinguish  a  "present"  value.  In  the 
above  [3  =  T  -  a  -  aN .  As  seen  the  value  of  X*  (t+a)  depends  on  the 

a 
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number,  N,  of  data  points  used  in  the  memory. 

We  replace  F  by  an  infinite  series  of  terms  whose  orthogonality 
is  invariant  to  X.  Thus, 

OQ 

F-  E  An  $  [X(t),  X(t-ai),  ....  XU-a^)] 

-00 

where,  independent  of  X  values, 


1 

T~CX~aN  _i 


rT-a 

$  [  ]  C  3  dt  ■  i;  n  "  m 

J  n  m 

^  -1  =  0;  n  ^  m 


Then  minimum  e  2  gives,  using  9 eft3  =  0  and  the  orthogonality  of  the 

9A 

{A  } 
n 

{$}, 


T-a 


x(t+a)  $  [  ]  dt 


T-a-aN_1  aN_i 


X$n 


<  X,  $  > 

n 


n 


<  $  ,  $  > 
n  n 


It  is  impossible,  however,  to  have  completeness  using  such  a  represen¬ 
tation  on  any  X(t)  (i.e.,  for  continuous  valued  X).  However,  quantization 
of  X  provides  a  realization  of  the  desired  orthogonal  set  for  all  X(t) . 

In  other  words  quantizing  X  allows  construction  of  a  set  f°r  aH  X, 

the  approximation  being  dependent  on  the  fineness  (degree)  of  quantization 
As  X  is  more  finely  quantized,  the 

F[X(t) . XU-a*^)]  -  EAn  $n  [X(t>,  ,  X^-a*^)] 

for  any  X(t)  with  the  {$  }  orthogonal  and  independent  of  X. 


t  See  Reference  1 
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The  orthogonality  depends  however  on  the  range  3  ,  chosen.  We  then  have 

a 


e  2  =  1 
a  *5" 
3a  j 


T-a 

aN 


(x(t+a)  -  §  An*n  [  ])*dt 


l_  Q  IT_a  Xa(t+a)dt  -  2  0nA^ 

0a  v  aN 


=  1 

3, 


a 


„T-a 


T-a-B 


T-a 


X*  (t+a)dt  -  2  (J  X(t+a)$n  dt)2^ 
°  T-a-Ba _ J 


a 


3n 


with 


fln  =  l'T_a 

u  n  J  $n2 

T-a-Ba 


dt 


We  see  that  An  represents  the  projection  of  predicted  X  values 
in  the  $n  direction  of  the  l$ri}  space.  Thus  Cq2  compares  the  true 

value  Jl-q-Ba  xS(t+a)dt  with  1 

ea 

probability  of  getting  An  over  3^  For  a  fixed  t,  X(t+a)  is  estimated 
by  a  single  A^  for  some  $n 


f?n 


where  j3n 

3, 


represents  the 


a 


3 . 0  QUANTIZATION  and  REALIZATION 

If  for  each  n,  the  $n  [X(t)  ,  X(t-ai),  •••,  XCt-Ow-i)]  =  1 

or  0  as  a  function  of  the  N  -  dimensional  argument  of  X  values,  then 

An  =  »$n]  is  a  conditional  expectation.  That  is  the  average  of 

3n 

X ( t+a)  conditioned  on  the  occurrence  of  $n(t)  -  1  as  t  varies  over  3a 

and  3n  counts  the  number  of  such  occurrences.  Let  us  now  consider  for 

equally  spaced  data  the  N  -  dimensional  space  of  iX(t-k6)jN"1  values. 

o 

Let  each  X(t-k6)  range  be  divided  into  Q  quantum  each  of  width  q  as  shown. 
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XCt-a^) 


X(t-36) 


Q 


Total  number  of 


X(t) 


JH 

q 


quantum  cells  in 
X-space  is  QN 


Thus  if  $>n  occurs  at  t  then  we  take  AQ  as  X*(t+a)  and  An  is  calculated 
as  average  of  X(t’+a)  values  at  all  t1  <  t  times  when  occurs.  The 


[$n]  set  remains  fixed  only  if  the  cell  structure  in  N-space  does.  We 


note  that  the  orthogonality  of  the  [$n]  is  independent  of  the  t  range  of 
integration.  Also  since  <  §n,§n  >  =  0  only  if  =  0,  the 

A  =  <  X,  $n  >  are  bounded 
<  §n,§n  > 

The  normality  condition  m 


3a  T-a-0a 


0aJ 


T-a-0a 


depends  on  the  X  values  that  is  which  member  of  the  ensemble  is  chosen  and 
the  span  of  calculation  3a.  We  see  that  j  produces  an  orthonormal 


set;  3n  =  C*n,*n]. 


It  is  useful  to  rewrite  the  formulae  for  A^  and  in  a  style  which 

reflects  the  cell  matching  imposed  by  the  quantization  as  reflected  by 
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set . 


the  Un} 


'n  =  J’  X  (t+a)dt 

Tn 

J  dt 

Tn 


=  J  X  (ti,n+a:>dt 

Tn 


where  Tn  =  {.t^:  §n  [t]  =  1,  t  £  [a*|-i  ,  T-a]}  so  that 


=  y(t+a)  conditioned  an  occurrence  of  cell  n. 


Nn 

*  £  XCtj+a) 

i=l 

N 


for  sampled  data 


QN 


and  E  N  =  3 

i  a 

n=  1 


for  a  fixed  cell  structure  over  t^e[aN-i,  T-a] 

N  -1 


so  that  any  [x}o  sequence  can  be  in  no  more 
than  one  cello 

N  ”1 

For  non-multiple  use  of  data  sets  [xjo  but  allowing  overlapping  cells 
the  total  number  of  matches  is  still  However  the  or thoganali ty  of 

the  §n  set  and  the  expressions  for  An  and  e^2  previously  developed 
rely  on  the  cells  in  N-space  being  non-overlapping. 

We  also  have 

x*(t+a)  =  2An$n  =  [An  :  §n(t)  -  1,  t  e[aM  -i  ,T-a ]j 

l^n<QN 


and 


B 


a 


=  t  [x(ti  +  a)  -  AnJ 

i-l  1 


3 


a 
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Incidentally  we  may  view  the  quantization  as  a  means  to  set  up  a 
special  set  of  orthoganal  functions  for  any  X  by  writing  $  as  a 
product  given  by 

’  •  ,4[X< t)  ,X(t-ai ) ;  *  ”  ^(t-a*-: )  ]„§n[X(t)  !$m[X(t-a1 )  ] 
Finally  ea3  -  (X(tH-a)  -  F[  ]3  -  (X(tfa)  -  EAn8n)a 

which  can  be  written  (X(t+a)  -  EAn$n  "  ]  "  ^n$n))S 

or  (X(bfa)  -  F  -  (ZAn«n  -  F))3 
These  forms  indicate  two  kinds  of  errors, 

(1)  Optimum  F  is  not  sufficient,  i.e.  future  not  predictable 
from  the  past  (not  usual) • 

(2)  IAn»n  is  not  a  good  enough  approximation, 
i.e,  orthoganal  set  not  complete. 

For  $n  derived  via  quantization,  the  level  q  plays  a  roll  in  (2) 
and  Indeed  contributes  a  quantization  error  as  well. 

Finally  wa  remark  that  when  F  •  F(t)  that  is  the  $n  are  changing 
with  t,  the  number  of  possible  An  »  number  of  possible  cells  >  QN ' 

3.1  Quantization  Error  and  Data  Calibration 

Given  a  joint  distribution  over  data,  we  can  always  construct  an 
orthonormal  function  set  relative  to  the  distribution  For  example, 
if  X  is  normally  distributed,  we  choose  the  8n  as  Hermite  polynomials 
In  dealing  with  time  series  we  interchange  time  and  ensemble  averages 
with  time  averages  taken  over  sufficiently  long  data  stretches 
providing  the  process  is  ergotic  and  so  stationary.  With  X  normal, 
the  optimum  predictor  is  a  linear  one,  This  result  can  serve  as  a 
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reference  to  the  non-linear  models. 


Approximating  F  via  quantization  gives 


An  -  <  X,}n;L  > 


where  L  is  the  peak  to  peak  signal  variation  in  the  interval 
We  choose  the  quantum  step  q  so  as  to  produce  the  desired  e3 
divide  the  X  range  into  Q  =  L/q  quantum  steps. 


used . 
and 


■L/ 


2 


AVG.  Value 


Quantization  thus  converts  X(t)  into  a  step  function  S(t)  where  any 
step  level  results  from  X  values  given  by  S^-q/2^X<S^-f q/2 .  The 
quantization  error,  e,  is  a  random  variable  with  the  X  distribution  and 
-q/2  ^  e  <  q/a .  With  q  small  enough  we  assume  that  e  is 
approximately  uniformly  distributed  in  any  one  interval  so  that  the 
phase  (ensemble)  average  of  eS  is  given  by 


AVG.  (e2) 


,q/ 

q/: 


e2  de  =  2  c£_  = 
q  3x8 


q3  / 12 


=  i_  (.  l5) 

12  J 

If  we  assume  1024  divisions  for  +  or  -  values  (a  representative  value 
for  some  A-D  converters)  then  Q  =  2048  and 

AVG.  (e2)  -  L 

-  2^0- 

n 

Thus  writing  the  mean  square  error  of  quantization  as  K  q  ,K, a  constant , 
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—r; 

the  mean  square  error  in  the  prediction  model  e  £  K  q 
From 


■'a 


=  JL 


r>T-a  q 

X“(tHx)dt  -  ESnA^ 

T-a-Pa  ° 


if  quantization  level  q  is  too  large,  Ano  -►  X  for  some  single  n=nG 
and  An  -  o,  n  £  nn>  &nQ  -  0a, 


e3  ~  0  (  ^  J  (x ( t+a)  -  X)* 
Pa 


dt)  ~  0(K' q^)  .  K'  >  K. 


T-a-3a 

Of  course  ea  may  be  large  with  q  small  for  operator  F  not  near 
optimum. 

It  is  well  known  that  in  general  the  non-linear  operator  extends 
bandwidth  due  to  intermodulation  of  components.  Also,  if  we  consider 
quantization  error  in  N-space  as  a  distance  d,  then 


-  /  N 


.th  „ . 


max  /  £  q.  with  q.  the  quantum  level  in  the  i  dimension. 

v  i=l  1 


If  qi  =  q  for  all  i, 


d  =  (Nq3)  2  =  N2q 
max 


For  preserving  constant  distance  (error)  ,  we  have 


q  ~  JL _ 

N* 

N  -1 

so  that  the  more  past  samples  used  to  form  a  data  set  i x}  ,  the 
smaller  q  required  to  maintain  the  quantization  error. 

With  a,  0a,N,q  variable  there  is  a  range  of  possibilities  with 
various  influences  on  . 

As  N  increases,  q  should  decrease.  With  decreasing  q  and  increasing 
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N  the  0n,  number  of  matches  over  0  ,  should  drop.  This  means  that  the 

N  “1 

number  of  matches  of  data  sets  [X(t'jo  to  some  reference  set 
N  -1 

[X(t)]o  ,  t1  <  t  decreases.  Thus  although  quantization  error  may  be 

r 

fixed,  variability  of  the  estimate  of  an  An  (associated  with  [X(t)o  J 
may  rise  due  to  the  reduced  number  of  matches.  In  general  we  desire 
as  many  cell  matches  as  possible  at  each  t. 

For  example,  unless  q  is  large  (and  so  larger  prediction  errors)  or 
a  trend  sensitive  parameter,  such  as  conditional  expectation  function,  is 
used,  no  reasonable  prediction  can  be  made  on  trend  type  data  since 
obtaining  cell  matches  becomes  difficult  or  impossible. 

One  method  to  increase  number  of  cell  occupancies  involves  using 
multiple  use  of  data  and/or  overlapping  cell  structure  with  time. 

These  procedures  allow  for  time  varying  i $n)  sets  and  form  a  degression 
from  the  theory  presented  above.  Details  of  implementing  such 
procedures  are  discussed  in  the  next  section. 

Another  way  to  improve  cell  matching  is  to  deal  with  essentially 
trend  free  data  using  polynomial  interpolation  techniques.  Simple 
linear  calibration  may  be  sufficient. 

The  calibration  of  the  input  data  can  be  given  as 
X(t)  =  ay  (t)+b  =  L  (y) 

where  L  represents  a  linear  operation  a(^o)  and  b  are  known  factors 
which  can  be  time  varying. 

In  the  prediction  model  we  operate  an  X(a) ,  a  £  t  with  F  to  give 

X*(t+a)  so  that  in  a  1:1  manner  we  have 

y*  ( t+a)  =  X*(t+q)-b 
a 
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4.0  METHODS  OF  ESTABLISHING  CELL  OCCUPANCY 


As  noted  above  we  desire  models  which  increase  cell  occupancy. 

This  involves  basically 

1.  defining  the  cell  structure 

2.  establishing  occurrence  of  occupancy 

W:  now  discuss  three  variations  on  dealing  with  these  two  factors. 

1)  This  definition  establishes  from  the  next  new  data  set  left 

,  N  -1 

from  the  [Xjo  data  sets  remaining  after  having  filled  all  previous 
cells  set  up.  This  allows  no  multiple  use  of  data  and  reduced  amount 
of  cell  overlap.  A  data  set  defining  a  cell  is  taken  as  the  center 
of  the  cell.  In  order  to  maximize  the  number  of  cell  prediction  the 
data  sets  are  used  starting  first  with  the  data  set  associated  with 
the  current  time,  t.  The  total  number  of  sets  equals 

2)  This  is  a  variation  on  (1)  which  allows  multiple  use  of  any 

N  -1 

set  [XJo  for  matching  except  those  previously  used  as  cell  centers. 
Thus  both  multiple  use  of  data  and  cell  overlap  occurs.  We  have 

K 

K(number  of  cells  setup)  ^  Pa  £  E  dnj  =  total  number  of  sets  used 

j=1  N  -1 

in  the  K  setup  cells  where  dnj  =  number  of  matching  {Xjo  sets 

into  cell  setup  at  n.. 

j 

If  we  could  essentially  assume  no  cell  overlap  that  is  a  fixed 
[$nj  set  (fixed  cell  structure)  then  for  (1)  and  (2)  we  would  have 

_  K 

ea2  =  i  2  (X(tt  +a)  -Ai  -)2 
K  j=l  j 


dij  (a) 

l _ 2  x(ti-+a) 

dij(a)  ij-i  J 


-ll- 


3)  Another  variation  takes  all  past  sets  as  cell  centers  so 
that  K  =  g  .  This  method  gives  more  cell  overlap  and  multiple 
use  of  data  but  maximizes  cell  occupancy  at  each  t  in  E^n-i  ,T-al 

N  -1 

and  places  the  center  of  the  cell  for  t  at  lX(t)jo  at  each  t. 
Briefly  in  summary  (1)  tends  to  minimum  occupancy  (overlap)  with 
the  maximum  of  these  at  current  time  t  while  (3)  produces  maximum 
occupancy  (overlap).  We  call  Method  A,  Approach  3)  and  Method  B, 
Approach  1)  together  with  cells  held  non-overlapping.  We  first 
discuss  Method  B  for  C  ^  and  An.  We  also  include  another 
variation  called  Model  C. 

4.1  Method  B 

With  fixed  cell  structure  that  is  [$nj  set  fixed,  we  have  from 
before , 

e“a  ‘fcOT!  (t+a>dt  •  i  Cn,x(M»,« c  •^) 

T-a-Pa  - fl5 - 

°n 

with  sampled  (discrete  time  series)  data 

_  K  n^  ~  x 

ea*  =  JL  S  (S  (X(tj  +a)  -  Ajk)8);  pa  =  S  nk(a) 

~  k=l  j=l  k=  1 

Pa 

with  An, 

k 


so  that 

71 

a 


~  nk  nk 

=  i/pa  £  x(tjk+a)$jk  =  £  x(tjk+a) 
_ izl _  J =1 _ 


-12- 


1 


K. 


p 

S  a  K  .  k  -SCV 

_  (  £  X8  (t  +a)  -  E  [  E  X(t  +a)]  ) 
n“l  n  k-1  J-l  J 


nk 


The  matchings  (for  each  0.)  to  the  cell  setup  by  past  set  {x(T)}*'1 

taken  at  present  time  T(i.e.,  nT )  produce  AnT  as  X*(T4a)  for  each  a. 
4.2  Method  A 

As  noted,  the  method  sets  up  each  of  the  0  data  sets  as  centers 

a 

for  cells.  The  approach  thus  allows  multiple  use  of  any  data  set 
[XJq-1  and  overlapping  cell  structures. 

Now, 

_  Ma  a 

e  3  =  1  E  [X(t  4a) -A  ]2  =  _±_  E  [x2(t  +a)-2X(t  4a) A  +A  2] 
OC  /v  i  n  n  i  n  n  n  n 

p  n*l  F  n*l 

Pa  oc 


for  each  of  the  0  ,  n=l,2,...,0^  where  we  take 


l  (a) 

n 


r 


a 


A  *  1 

n 


£  X(t.  +a)  ;  £  l  *  0 


*n(a)  j-l 


jn 


n-1 


n  a 


Then,  without  orthogonality  effects  allowed  since  the  {$  }  sets 
correspond  to  overlapping  cells, 


t 


a 


e  *  JL  S 

a  ff 
a 


n 


xd(t  4a)-2X(t  4a)  £  x(t.  4a)  +  /  £  x(t  4a) 

r  n  n  .  ,  in  /  .  ,  in 

i —  J  I  J 


n 


/ 
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4.3  Model  C 


In  order  to  increase  further  the  number  of  occupancies  and  so 

hopefully  reduce  averaged  over  a  for  a  min  <  a  <  a  max,  for  the 

cell  specified  at  the  current  T  value.,  we  consider  sequentially 
N  -1  N-l 

{x(T- j  6) }q  as  potentially  equivalent  {X(T)j  data  sets  with 

adjusted  a  values  This  is  valuable  when  tX(t)j  has  none  or 

o 

few  matches c  For  a  the  data  sets  then  used  for  equivalent  prediction 
points  are  those  at  t  ^  T-a«  We  designate  the  corresponding  data 
sample  numbers  as  nT  and  nT ^  where  t.  =  T-a-j6  where  6  is  the  data 
spacing . 

Thus ,  at  a  using  the  set  at  nT  .  ,  we  use  a .  =  Ct+j  6 ,  j  =  0 , 1 ,  .  .  .  ,R 

N  -1 

and  look  for  occupancy  of  the  cell  determined  by  [X(T-j6]  by  the 

r  .n  -l 

sets  [X(t)Jo  ,  t  <  T  -  j6  that  is  for  n  <  nT 

At  j  the  maximum  a  value  possible  as  in  previous  methods  is  de- 

N  -1 

termined  by  a  =  nT  ,-n  where  n  are  the  matching  data  sets  fx^ 
max  1  j  &  v  J  o 

J 

to  {X(T"j6)}N"1  -  The  minimum  a  in  a  «  16. 
o  min , 

J 


We  take 


€  . 
J 


a 


max . 
J 
Z 


a  a j  =a  . 

max  #  mm  # 

J  J 


e  . 
aj 


for  each  j . 


Then  we  choose  the  nT  .  and  associated  matching  sets  which  give  min  e 

J  j  J 

and  calculate  the  A  set  for  the  a  <  a.  <  a  corresponding  to 

n  min .  i  max . 

J  J 

the  a  If  a  <  a  ,  then  for  nT  .  <  nT  .  we  choose  the  next  lowest 
max .  max  _]  l 

J 

which  has  a  >  a  •  The  process  is  repeated  until  a  =  a 
1  max .  max .  max .  max 

J  J  J 

or  the  j  range  is  depleted 


t  Of  course  increased  occupancy  must  be  weighed  against  basically  higher 


Ga  as  a  increases. 
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4 .4  Adaptive  Control 


2 

As  we  have  seen,  parameters  N,  q,  g  influence  e  .  It  is  thus 

CL  cl 

useful  to  have  criterion,  even  if  only  crude,  by  which  to  judge  such 
effects . 

We  have  noted  the  error  amplitude  resolution  in  the  sampling 
process  is  3 / 12  where  q0  is  the  amplitude  resolution  or  imposed 
quantization  range  q.  For  example,  with  binary  data  and  q  =  2  , 
the  error  is  4^  ^/3. 

Referenced  to  the  original  (unquantized)  data  before  sampling 

— 5 - 

and  excluding  other  error  sources,  the  total  error  G  TOT  *-s  aPProxi- 
mately  e  3  +  q2  / 1 2  with 


o  £ 


s  1 


’  T-a 

a 


x3  (t+a)  dt 


K 


a 


x3  (tn+a) 


Thus,  we  may  usually  expect 


r 


qa/l2  £  e2  £  q2  / 12  +  1  £  X2(t  Ml) 


TOT 


®a 


0  n  1 


With  e  3  =  1  E  X3  (t  +a)  -  E  0.  a  3  and  qs/l2  not  truly  independent 

a  v  n=l  n  s  k  k 


errors  since  e2  depends  on  q ,  we  might  have  z2  >  e2  for  example. 
OL  TO  T  CL 


_ _  _  K  n 

7 ;  =  X3  (t+cc)  -  E  [  E  X(t  -hx)  ]3 

a  k=l  J-l  jk 


nk 


From 


as  n  -*  o  (no  cell  matching) 


X2  ( t-hx) „ 


We  may  then  consider  using  the  values  q2/2  and  X2  (t-Kx)  as  bounds  for 

testing  0  ?  when  changing  q,  N,  3  to  determine  acceptability  or  for 
CL  OL 

improving  e  3  to  obtain  X*(T+a.)  . 

OL 

4 05  Population  of  Cell  Occupancies 

With  Q  steps  of  magnitude  q  and  N  past  point  data  sets  there 

are  QN  possible  cells  (i.e,?  A  “s  using  fixed  structure).  The  per- 

n 

centage  actually  used  is  much  smaller 3  say  less  than  1%  on  finite 

sections  of  data  In  digital  simulation  it  is  not  required  to  store 

all  the  possible  or  even  expected  [A  ]  set. 

n 

For  example,  with  N=2  and  independent  normally  identically  dis¬ 
tributed  data  at  X  0  Xn_i  for  al 1  n  with  a  standard  deviation  of  a 
n  11  L 

the  number  of  A  drops  from  Q2  to  3a(2Qq-3a),  qQ/2  >  3a  An  estimate 
n 

of  3a  can  be  taken  as  the  largest  amplitude  increment  between  succes¬ 
sive  samples. 

Puns  have  been  made  on  data  to  determine  and  remove  trends  using 
data  types  such  as 

(1)  monotonic  increasing  data  of  a  component  of  velocity  sampled 
at  10  times  per  second a 

(2)  periodic  high  frequency  type  data  from  accelerometer  residuals 
sampled  at  10  times  per  second. 

(3)  radar  trajectory  position  data  at  10  times  per  second. 

A  definite  X*(t+a)  versus  q  relationship  was  difficult  to 

establish  for  various  ranges  of  N  and  0  » 

wL 
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4 .6  Comparison  of  Methods  A  and  B 


It  is  convenient  to  display  errors  for  comparison  by 


—  *  F= 

7.  rms  error  =  7.  (  e  2)  =  /  e  2 

and  %  actual  error  *  70  e  *  =  X*  -X 


Due  to  the  flexibility  of  centering  cells,  it  is  expected  that  A 
gives  better  X*  (low  %  e^*)  than  B  so  that  the  possibility  of  cell 
occupancy  is  more  likely. 

It  is  recalled  that  e  was  taken  with  respect  to  prediction  (x*  =  A 

a  _  n 

?  T 

point  sample  nT  only  while  in  Method  A  was  calculated  over  matches 

not  only  to  data  set  for  nT  but  matches  to  data  sets  for  all  n  used. 

It  is  suggested  a  revised  method  A,  say  A" ,  be  examined  to  remedy 
these  situations  by  (1)  allowing  actual  predictions  to  be  calculated 
over  the  data  range  and  not  just  at  nT  and  (2)  calculating  e  2  at 

a 

each  prediction  point,  nT ' ,  from  matches  of  past  data  sets  to  the  data 

set  at  nT 1  only.  At  each  nT 1 > [x(Tf ) }N”1  is  considered  fixed  allowing 

o 

A  =  <OC,  $  >  to  be  used  with  cell  center  fixed  by  {x(T,)}N  *  which 

n  _ n  o 

<$  ,$  > 

n  n 

shifts  with  T'  to  make  t A^}  set  time  varying  with  nT  1  . 

5.0  GENERAL  LINEAR  PREDICTOR 

"General"  is  taken  to  mean  no  assumption  of  stat ionarity .  The 
use  of  the  general  linear  predictor  provides  a  basis  of  comparison 
for  the  non  linear  predictor. 
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For  continuous  (nonquanti zed)  data  the  form  of  the  basic  system 
of  equations  for  determining  the  coefficients  of  the  optimum  linear 
predictor  (operator)  are 


T~a  n -i 

X(t-T  )  X(t+a)dt  =  £ 

d  o  n 


m=o 


rT-a  ^ 

X( t-T  )  X(t-T  )dt  ) 
J  n  m  y 


m 


n=0 , 1 , . . . ,N- 1 


which  for  sampled  data  becomes  (with  a  slight  abuse  of  notation) 

Ba  ?a 

*  V.V  (/\-nOv 


It  can  be  noted  that  an  analogous  form  of  equation  appropriate  to  the 
nonlinear  method  is  given  by 

M  pa 

£  §„  X  =  Z  C  Z  §  (X.  •••)  $(X  ;  n«0,l,...,M 

.  n  k+a  ,  n  k ,  m  k  m 

k=o  m=o  k=o 

If  we  define  R(§,r|)  -  £  X^-X,  n  we  have 

k=o  ^  ^  v 

U(n,-a)}  -  C(R(n,m)  ))  {a  } 

m 

(nXl)  (nXm)  (jnXl) 

so  that 

{am}  =  ( (R(n,m)  ))  1  {R(n,-a)} 

with  ( (R(n,m)  ))  and  [R(n,-a)}  both  functions  of  a  and  the  point  from 
which  prediction  is  made. 

With  allowance  for  variable  a,  a  comparison  with  the  nonlinear 
method  indicates  that  for  the  same  amount  of  information,  linear  pro¬ 
cessing  time  is  higher  by  a  factor  as  high  as  10. 
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% 


We  remark  that  calculations  at  data  rates  higher  than  that  pro¬ 
vided  can  be  afforded  in  all  processing  modes  by  a  sliding  Lagrangian 
fit  such  as  a  cubic. 

Finally  comparisons  are  for  all  prediction  points  having  one  or 
more  matches  in  the  nonlinear  processing.  Also,  the  rms  values  of 
actual  errors  taken  over  these  points  are  also  found  useful. 


6.0  OBSERVATIONS,  REMARKS  AND  CONCLUSIONS 

.s 

% 


It  was  found  useful  to  use  *£“  33  r.m.s.  value  of  observed  error, 

o 


"a 


calculated  over  matches 


A 

and  e 

A 


associated  with  a  given  point  of 
prediction 

r.m.s.  value  of  actual  error, 


averaged  over  all 


x 


X 


available  as  point  of  prediction 
is  changed. 

6 . 1  Linear 

For  small  a  ,  the  average  increases  with  N  as  should 

be  expected  since  the  linear  operator  becomes  potentially  more  limited 
as  the  extent  of  the  past  (memory)  increases.  The  error  IT  increases 
by  about  a  1.6  factor  as  a  goes  from  1  to  2. 

6 .2  Nonlinear 

A 

For  probability  of  occupancy  as  low  as  5%,  the  remain  reason¬ 
ably  fixed.  For  q  low  enough,  decreases  with  increased  N  as  the 
occupancy  probability  drops.  In  other  words,  at  each  prediction  point 
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the  £  decreasing  measures  the  degree  of  correct  selectivity.  As  N 

* 

increases  with  q  high  enough,  large  variation  in  actual  error  X  -  X 

A  A 

is  allowed.  Indeed  e  decreases  with  increasing  N  while  £  increases 

a  a 

A 

by  a  factor  of  1.5  as  a  goes  from  1  to  2.  As  q  decreases  the  e.  and 
“T  agree.  As  a  increases  the  q  value  necessary  for  this  agreement 
increases 

For  low  N  and  a  (say  1)  ,  eA  can  decrease  with  g^  while 

increases  for  the  same  q»  Both  ^  and  e  reduce  with  increased  fs 

a  o 

A 

(sampling  rate)  for  the  same  effective  g  N,  a  Also,  e  and  e 

Oi  AO 

appear  relatively  insensitive  to  the  interval  between  prediction  points 
or  to  the  over-all  span  of  these  points,  the  latter  effect  relying  on 
the  degree  of  stationarity  present 

6.3  Linear  Versus  Nonlinear  (for  a  given  data  sampling  rate) 

Although  the  linear  procedure  requires  no  quantization,  comparisons 
with  the  nonlinear  method  are  more  equitable  with  the  nonlinear  results 
when  this  discrepancy  is  taken  into  account .  Basically  unless  a 

A 

sufficiently  low  q  can  be  used  the  e  values  from  linear  processing 
are  lower  as  was  the  case  for  a  =  1,2  and  all  the  N,  g^  and  data  used. 

As  q  lowers  to  favor  the  nonlinear  error,  the  probability  of  a  pre¬ 
diction  occurring  falls  sharply  for  the  g^  range  used. 

The  sample  conditional  expectation  obtained  with  finite  length 
records  and  the  lack  of  completeness  with  q  values  >  o  result  in  gaps 
in  the  determination  of  the  conditional  expectation  on  which,  indeed, 
the  prediction  is  based. 
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A  first  and  crude  step  toward  overcoming  this  deficiency  can  be 
taken  via  a  two  point  interpolation  as  follows.  (More  exactly  as  N- 
,  dimensional  interpolation  on  the  data  sets  {x(t)}^  of  nearest 

★  f  .N  -1 

matches  could  be  made  to  specify  X  (t+a)  associated  with  {.X(T)jo  ). 

We  let  n  be  the  count  corresponding  to  time  t  as  before  and 

{xj  and  {x}  be  the  two  N-dimensional  data  sets  having  the  two 
ni  n 2 

smallest  distances  dx  and  d2  (£dx )  respectively  in  N-space  from 

lx}  .  We  take 
nT 

X*(T-ta)  =  X(tx+a)  +  dx  AX; 

di  +da 

with 

,  AX  -  x(ta+a)  -  x(tx+a) 

The  minimum  mean  square  error  criterion  results  in  the  predicted  value 
*  'k 

X  (t+a)  being  given  by  a  quantized  version  (for  determining  cell 
occupancy)  of  the  sample  conditional  expectation  E(X(t+a) |C)  where  C 
is  the  cell  whose  center  is  determined  by  (X(T)}^  rather  than  the 
non-quantized  version  E(X(t+a)  1  lX(T)}^  1).  We  have 

x '(t+a)  =  E(x(t4a)  |  c)  =  r  x(t+a)  P(x(t+a)  |  c)  dx 

and  for  sampled  data 

x*(t+a)  =  EX  Pa(x|c) 

With  a  finite  stretch  of  data  we  must  consider 

m 

X  <  I  «  ■  *d.k.c 
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where  K  =  number  of  times  an  X(t4a)  value  from  the  0a  sequence 

CC,  k ,  c 

of  X  values  lies  within  the  (X^,X^  )  range  given  that  its  corres- 

ponding  past  data  set  {X ( t ) } ^  x  enters  cell  C. 

L  -  number  of  past  data  sets  which  enter  cell  C  over  the 

a5c 

0a  sequence  of  data  points  * 

We  may  note  here  the  various  first  order,  joint  and  conditional 
probabilities  as  follows 


Shortened  Notation 

P  [x,  S  x  <  X.  , ,  I  c]  =  K  =  p  (X  C) 

a  k  k+l  a ,  k ,  c  a  1 


a,c 


P  (C)  =  ln 
CL  CC  ,c 


a 


p[x.  <;  x  <  x.  c]  =  k  ,  -  p  (c)  p  (x|c)  -  P  (x,c) 

k  k+l 5  a,k,c  a  a  a 

e. 


a 


P[XR  S  X  <  Xk+1]  -  -PaCX) 

a 

P[C|Xk^X<Xk+i]  -Kn  t  c  •  'VC|X) 

Ka,k  VX> 

where  K  .  and  £  are  defined  above  and  K  .  Is  the  number  of 
a5k,c  a?c  a,k 

times  an  X  value  in  the  sequence  of  data  points  lies  within  the 
range  (XR,  X^)  . 

In  Method  A* ,  we  have  over  a  range  of  prediction  points 


E  Pa  [Xk  *  x  <  Xk+i,c]  *  pa<xk  £  x  <  W 
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since  the  cells  overlap.  Indeed 


(E  Pa[Xk  £  X  <  Xk+1  ,  C]  -  Pa(Xk  S  X  <  Xk+1)  -  a  Kakc)  -  Kak 

r 

a 

However, 

Ek  V\  *  x  <  xk+l’  C)  ■  Pa(C) 

* 

Another  approach,  of  course,  for  determining  X  (t+a)  could  be  based 

on  the  maximum  likelihood  estimator,  this  is,  on  MAX  P^fX(t+a)  J  C) 

.  -)N  "I  * 

where  C  is  the  cell  determined  by  [X(T)j  ,  then  X  (t+a)  can  be  taken 

o 

as  X,  where  k  produces  the  maximum  value  of  K 

\+%  m  ct.k.c. 
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